Turkish Validation of the User Version of the Mobile Application Rating Scale

Objective : As the number of mobile health applications increases, quality assessment becomes a capital feature of any mobile application design. Besides the professional evaluation conducted before marketing the app, the perceptions of the subjects to whom is intended will determine the successful widespread dissemination. Hence, the implementation of a given app may be impaired by the lack of a validated translation and cross-cultural adaptation. We aimed to validate in the Turkish language the User Version of the Mobile Application Rating Scale, an English original scale designed to assess the quality of mobile health applications. Materials and methods: A well-established and predefined process of cross-cultural adaptation and translation to Turkish of the User Version of the Mobile Application Rating Scale according to the World Health Organization guidelines was performed using a common, readily available, free-of-charge application. Internal consistency and reliability were tested in a population sample by Cronbach’s α and rWG index, respectively. Results: The total User Version of the Mobile Application Rating Scale score had good internal consistency (Cronbach’s α = 0.87). Internal consistencies of its subscales were also acceptable: with Cronbach’s α of 0.71, 0.78, 0.71, and 0.73 for engagement, functionality, aesthetics, and information, respectively. Cronbach’s α of the satisfaction subscale was 0.46. The User Version of the Mobile Application Rating Scale total and subscales scores had a strong within-group agreement, all of them with rwg indexes between 0.78 and 0.87 over baseline to 1 month. Conclusion: The Turkish version of the User Version of the Mobile Application Rating Scale is consistent with the English original version and is a reliable and valid tool to assess the quality of mobile applications by Turkish users.


Introduction
Presently, there are more than 7 billion mobile subscribers in the world. 1 In 2017, there were more than 350 000 mobile health (mHealth) applications (apps) available in the app stores and at least 19% of mobile users have downloaded health-related apps on their smartphones. 2 While healthcare professionals are increasingly using mHealth apps as a source of information, diagnostic help, and in dynamic clinical scenarios to facilitate monitoring patient medication and symptom management, the primary target population for mHealth apps is the users. As of 2015, 82% of the US individuals aged 18-49 years owned an app-enabled mobile phone, and cross-sectional studies show that almost a third of the mobile users had downloaded a health-related app, mostly related to fitness and nutrition. [3][4][5] These figures underscore the magnitude and impact of mHealth apps in the current e-health environment. 6 The rapidly evolving mHealth app market leads to an overwhelming selection, which makes choosing the right apps increasingly challenging for users. Overall, information about the quality of apps used in a clinical setting is limited 7 and there is a lack of consensus on the methodology for app quality evaluation. 8,9 Overcoming these limitations could lead to increased confidence in the credibility and usefulness of mHealth apps among professionals and end-users.

Turkish Validation of the User Version of the Mobile Application Rating Scale
The Mobile Application Rating Scale (MARS), 10 developed in 2015, is a short, objective, and reliable tool for classifying and assessing the quality of health apps. It has been crossculturally adapted and validated in several languages [11][12][13] and successfully used to assess the quality of mHealth apps. [14][15][16][17] However, it requires a certain degree of health and e-Health expertise to ensure reliable and objective quality rating scores. A later simplified version was developed and validated, resulting in the User Version of the Mobile Application Rating Scale (uMARS), which can be used in large-scale trials, or research with end-users. 18 The usefulness of the uMARS in the evaluation of mobile apps incorporated in the clinical setting has been recently demonstrated. 19,20 While there is still a notable lack of content diversity of the Turkish mHealth apps-most of them address making doctor appointments, fertility issues, and general wellness-the market is gradually growing. However, the slow adoption among users and health professionals limits their impact and effectiveness. 21 Thus, there is an increasing need for a reliable, culturally valid instrument to facilitate quality evaluation and support the dissemination of Turkish health apps. The scale would allow professionals, researchers, and developers to conceptualize, evaluate, recommend, and disseminate health apps with increased confidence.
Urology is not an exception in this growing app market. In the frame of an international project developing and testing the utility of an App ("MyBPH") to monitor urinary symptoms, treatment compliance, and outcomes of patients affected by lower bladder outlet obstruction (BOO), the International Consortium of the study aimed also to assess the quality and ultimately the patients' perception of the app as the later will determine its utility and diffusion. The tool chosen to qualitatively assess the "MyBPH" App was the uMARS, of which there was not yet any existing cross-cultural translation and validation of the original English version to the Turkish language. Consequently, the present study aimed to translate, adapt, and cross-culturally validate a Turkish language version of the uMARS in the frame of the above-mentioned MyBPH project.

Material and Methods
The study consisted of a cross-sectional linguistic and crosscultural validation of uMARS to the Turkish language.

The uMARS
The uMARS consists of 20 items, organized in five subscales, to evaluate the quality of health apps. Every item is rated on a 5-point Likert scale ranging from 1 (poor) to 5 (excellent). Four subscales are objective and related to quality rating: engagement (5 items), functionality (4 items), aesthetics (3 items), information (4 items), and one subscale evaluates app quality subjectively (4 items). A further subscale of 6 items measures the user-perceived impact of the evaluated app. 18 An ethics committee approval was not necessary as this article was the result of the Turkish translation and validation of an already in-use international questionnaire. Voluntary participants had given their oral consent while answering the questions in the questionnaire so separate written consent was not required. Also, the questionnaire did not contain any medical information.

Cross-Cultural Adaptation and Translation
The methodology to complete the Turkish translation and crosscultural validation of the English uMARS followed the World Health Organization guidelines, 22 consisting of forwarding and backward translation, pilot testing, and accuracy evaluation to produce an accurate translation of the original scale ( Figure 1).
A multidisciplinary team including researchers, health experts, and a language professional (with a degree in English literature) constituted the team involved in the project. The translation and adaptation involved the following process: (i) the English version of uMARS was independently translated to the Turkish language by two native Turkish bilingual speakers fluent in English (one physician and one non-medical professional); (ii) these two Turkish translations were discussed among the research team including the 2 original Turkish native bilingual speakers for harmonization purposes resulting in a single Turkish translation keeping the meaning the closest possible to the original English version; (iii) this Turkish version was then back-translated into English by two different bilingual researchers; (iv) the result was compared to the original English version of the original English scale and their feedback was implemented; (v) the final Turkish version was then reviewed by 5 Turkish-speaking individuals external to the research group. They provided feedback on their understanding of the items and the appropriateness of the Turkish language. Their comments and suggestions were discussed among the research group; and (vi) the necessary modifications to approximate the Turkish version to keep the English meaning were done based on the

Main Points
• The User Version of the Mobile Application Rating Scale (uMARS) version of the Turkish language provides a reliable instrument to evaluate the general quality of applications (apps) in the Turkish language.
• uMARS is an easy-to-use questionnaire by untrained mobile app end-users.
• As the medical mobile app market rapidly grows in Turkey, the valıdated Turkish versıon of the uMARS represents a useful tool to assess future app qualıty.

Selection of the Mobile App for Validation
Once the Turkish language translation was ready, a mobile app was selected according to the following inclusion criteria: • Available in Google Play and Apple Store.
• Free of charge.
• Targeted at adults (≥18 years old) of any gender.
• Containing at least some features that allow fulfilling the core of the interaction to the medical counseling and intervention process (e.g., question and response, interchange of pictures or reports, and numerical results). • Available in the Turkish language.
At study inception, there were 529 apps in the Turkish language in the category Health & Fitness. Apps were categorized in: • Specific gender or medical condition (40.1%) focusing on specific medical conditions requiring expertise, baby care, or gender-biased in nature (e.g., ovulation control) • Fitness or well-being (57.8%) (e.g., yoga or weight loss) • Hospital or doctors' appointments exclusively (1.9%) Most of these apps required either knowledge of the condition or targeted a specific motivated population and their access was not for free. After discussion and consensus, the research team chose "WhatsApp"-a general communication app, accessible free of charge, and widely used across Turkey-for the validation testing. This choice was based on the presumption that WhatsApp can mimic the doctor-patient, therapist-client interaction apps, and that it was or could easily be used at least 1 month before the initiation of the first evaluation. Lastly, the team internally trialed the uMARS and thought that the uMARS scale could be applied to other types of apps, as long as they include at least some mHealth characteristics.

Testing Population Sample
The population for validation was selected among the possible "end-users" of the uMARS. The sample size necessary was estimated at 50-100 individuals after discussion with the first author of the original uMARS. 18 A call was made among fifth-year medical students of the Medipol University and medical workers of the Medipol University Hospital excluding physicians and the general population from the researchers' networks. All participants were native Turkish speakers with at least secondary school graduations. The Turkish translation of the uMARS was distributed by email to all participants or paper-printed on demand.
Each participant was asked to use (or have used) the app for at least 1 month before completing the Turkish uMARS twice (4-6 weeks apart). They were instructed on completing the questionnaire individually and without help. Participants emailed or handed in their scores at the beginning and the end of the evaluation period. For each round, a reminder was sent 2 weeks after the initial distribution of the Turkish uMARS.

Data Analysis
The internal consistency of the uMARS subscales and total score were calculated using Cronbach's α. Test-retest reliabilities were calculated for the subscales and total scores of the uMARS after the second round (i.e., a test-retest period of approximately 1 month). The within-group agreement (r WG index) was chosen due to the skewness of the responses. 21,22 Although the r WG index is more a measure of agreement than a reliability index, it was preferred over the more commonly used the intraclass correlation (ICC) index because of the low variance in the recorded answers range. 23 The vast majority of participants used a smaller range (3,4,5), therefore a unit in change is rather large within the overall range of answers (1)(2)(3)(4)(5). Furthermore, r WG is expected to perform better in Likert scale measurements and especially when a single group of participants is used than ICC. 24 Levels of interrater agreement are: lack of agreement = 0.00 to 0.30; weak agreement 0.31 to 0.50; moderate agreement 0.51 to 0.70; strong agreement 0.71 to 0.90 and very strong agreement 0.91 to 1.00. 24 Statistical analyses of the data were done by using the SPSS version 23 (IBM SPSS Corp.; Armonk, NY, USA).

Results
The Turkish uMARS was distributed by email on December 23, 2019 to a sample group of 111 participants, all ≥18 years old. The last questionnaire of the second round was received on May 4, 2020.
Overall, 83 (74.7%) participants completed the first round after at least 1 month of app use and 74 (66.6% %) completed the second round (i.e., a test-retest period of approximately 1 month). The characteristics of the responders are described in Table 1. All the responders were fluent in spoken and written Turkish language.
Mean scores were highly consistent over time: for section A (engagement) they were 3.90 and 3.94 for the first round and second round, respectively; for section B (functionality) they were 4.28 and 4.22, respectively; for section C (aesthetics) they were 3.98 and 4.08, respectively; and for section D (information) 3.80 and 3.83, respectively. The mean total mean score (A+B+C+D/4), 3.99 for the first and second rounds ( Table 2).

uMARS Internal Consistency and Test-Retest Reliability
Consistency was calculated based on the first-round score (83 participants). The total uMARS score had a good internal consistency (Cronbach's α = 0.87). Internal consistencies of its subscales were also acceptable (engagement α = 0.71; functionality α = 0.78; aesthetics α = 0.71; information α = 0.73) with a notable exception regarding satisfaction α that equaled 0.46, which is to be expected from a subjective scale.
Within-group agreement (r WG index) was strong for all subscales and the total uMARS score ( Table 2). The value of the item concerning payment was the only item negatively correlated.

Discussion
We hereby present the development and validation of the crosscultural adaptation of the Turkish uMARS. The results of this study indicate that the scale has a good internal consistency and test-retest reliability. Our data show that the 4 objective subscales of the Turkish version of the uMARS reflect the degree of correlation and strong agreement between measures or reliability. Most of the scores for all objective subscales also present strong reliability and validity levels. The satisfaction subscale was strongly influenced by large disparities in participant answers to question (#19) regarding the willingness to pay for the app. Interrater reliability within the group was equally high for all the questions except for question #19. Indeed, the question "would you pay for this app?" was the only one that affected the r WG of the subjective quality subscale and had a negative interrater agreement. As the app chosen for validation is broadly used and freely available, one would expect a high level of concordance between the two response rounds and among respondents. However, willingness to pay highly depends on socioeconomic variables, the specific context of the consumer, and the service provided. 24 However, our study participants ranged in socioeconomic status and, their ability or willingness to pay for an app may have been variable. Furthermore, considering the time frame of the first and second response rounds (beginning of 2020 with the ensuing COVID-19 pandemic), singular social, health, and economic factors may have played an additional and distinctive role in the intra-and inter-variability responses.
Previous research suggests that the dissemination of health apps without a proper quality evaluation may result in negative consequences for users (patients) and professionals (medical personnel). 25,26 Although there are several instruments designed to evaluate the quality of medical apps, a clear definition of the theoretical framework to test quality is yet lacking and a wide heterogeneity exists in the criteria used to determine app quality. 27 In this setting, uMARS offers a clear definition of the theoretical framework to test medical apps' quality and has been translated to several languages and externally validated. [28][29][30] In line with previous linguistic validations, the Turkish translation 240 of the uMARS aims to bridge the gap between developers and end-users. It can be used for consumer research, as well as a checklist by mHealth developers in ensuring the high quality of their products.
In the frame of a study on the usefulness and effectiveness of a benign prostatic hyperplasia (MyBPH) app, trialed simultaneously in several countries, we needed an instrument to assess the quality of our app in Turkish. 21 The choice of the uMARS as a quality measuring instrument was based on its robustness and simplicity when compared to other scales specifically designed to assess the quality of a medical app. 18 Moreover, it is designed for end-users irrespective of expertise.
Hitherto, after following the necessary steps for linguistic translation and cultural adaptation for the validation of the Turkish uMARS version, we chose a broadly used app ("WhatsApp") with which the general population was well acquainted. While the use of a non-medical app may be considered as a limitation, the main objective of this study was to translate, adapt, and cross-culturally validate a Turkish language version of the uMARS and it was not focused specifically on the scale's apps in mHealth research. Rather than a limitation, the use of WhatsApp for cross-cultural validation can also be seen as a broadening of the uMARS uses. The original uMARS questions/items are general and refer to quality features that are not specific to a medical App and that apply to any App design in any field. Lastly, the chosen App was at that moment widely used by the Turkish population may decrease the burden of familiarizing with a new App, did not require specific knowledge from the evaluators, and was cost-free.
Limitations of our study include the possible negative influence on the response to subjective item #19 and the potential effect of the socioeconomic imbalance, which occurred during the COVID-19 pandemic on user scores of this item. However, as stated above, the uMARS permits for the subjectivity of this subscale and aims to correct for this through the computation of an overall subjective score, not reported as a component of the mean quality score of an app. We believe that Turkish uMARS hereby presented is a reliable translation of the original scale and can be confidently used by researchers, developers, and health professionals and that it will facilitate further research and development of mHealth in Turkey offering professionals, researchers, and end-users a valid tool for and objective assessment of app's quality in the Turkish language. Currently, further research is being conducted by our team, testing the properties of the scale when applied to an app for monitoring symptoms and treatment of Turkish patients with BOO caused by symptomatic MyBPH.
To the best of our knowledge, this study presents the first-ever translation of a user-targeted app quality measuring scale in the Turkish language. As research in this area continues to grow, further studies should explore the concurrent validity of the uMARS, compared to other app quality instruments focusing specifically on health apps.
The Turkish translation and cross-cultural validation of the uMARS showed good consistency (Cronbach's α 0.87) and testretest reliability (0.87). Overall, the test-retest agreement was strong for all the objective subscales of the uMARS (engagement, functionality, aesthetics, and information) with r WG between 0.71 and 0.84. Although the subjective item (satisfaction) had a low consistency exclusively at the expenses of the question regarding payment, it also showed a strong test-retest agreement (r WG 0.78).